A Novel Thresholding Method for Text Separation and Document Enhancement
نویسندگان
چکیده
Many thresholding-based image enhancement techniques have been developed and used for document analysis, where the simplicity and efficiency of thresholding makes it ideal to use for classifying layers within documents. However, the efficiency of these enhancement techniques can be impaired by the variation of grey levels in different documents, thus causing over-thresholding or under-thresholding. This paper presents a novel global singlestage thresholding method for separating background and foreground layers in text documents. The method finds an optimum thresholding value or exact separation point for each document using the relationship between luminance value and mean intensity of the document without considering peak values in the grey level histogram. The proposed method is implemented using 50 historical documents and five specifically designed words, and then compared to three other efficient and known thresholding methods. Experimental results suggest that the proposed method performs well for text separation and enhancement of document images.
منابع مشابه
Extraction , Enhancement and OCR
In this paper we address the problem of text extraction, enhancement and recognition in digital video. Compared with optical character recognition (OCR) from document images, text extraction and recognition in digital video presents several new challenges. First, the text in video is often embedded in complex backgrounds, making text extraction and separation diicult. Second, image data contain...
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملDegraded Document Image Binarization Techniques
Document Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR) and Document Image Retrieval (DIR). This research area has been studied...
متن کاملEnhancement of Learning Based Image Matting Method with Different Background/Foreground Weights
The problem of accurate foreground estimation in images is called Image Matting. In image matting methods, a map is used as learning data, which is produced by those pixels that are definitely foreground, definitely background ,and unknown. This three-level pixel map is often referred to as a trimap, which is produced manually in alpha matte datasets. The true class of unknown pixels will be es...
متن کاملرفع اعوجاج هندسی متون بهکمک اطلاعات هندسی خطوط متن
Document images produced by scanners or digital cameras usually have photometric and geometric distortions. If either of these effects distorts document, recognition of words from such a document image using OCR is subject to errors. In this paper we propose a novel approach to significantly remove geometric distortion from document images. In this method first we extract document lines from do...
متن کامل